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A new VLSI design of a pipeline Reed-Solomon decoder is presented. The transform 
decoding technique used in a previous design is replaced by a time domain algorithm. 
A new architecture that implements such an algorithm permits efficient pipeline pro- 
cessing with minimum circuitry. A systolic array is also developed to perform erasure 
corrections in the new design. A modified form of Euclid’s algorithm is implemented 
by a new architecture that maintains the throughput rate with less circuitry. Such 
improvements result in both enhanced capability and a significant reduction in silicon 
area, therefore making it possible to build a pipeline (31,15) RS decoder on a single 
VLSI chip 


I. Introduction 

Recently Brent and Kung (Ref. 1) suggested a systolic 
array architecture to compute the greatest common divisor 
(gcd) of two polynomials. Based on this idea a VLSI design of 
a pipeline Reed-Solomon decoder was developed (Ref. 2). The 
syndrome computation of this decoder for a 4-bit (15,9) RS 
code was implemented on a chip (Ref. 3). 

In the design of the chip for the above-mentioned decoder, 
three major problems arose: 

(1) While the architecture for syndrome computation took 
(N - I) cells for an (N, I) RS code, it required N identi- 
cal cells to* implement the inverse transform in the 
architecture suggested in Ref. 2. As a consequence 
for a long code such as the (255,223) RS code, the 
inverse transform circuit would need 255 cells and be 
quite large. 


(2) The basic cell of the systolic array needed to perform 
a modified form of Euclid’s algorithm occupied con- 
siderable silicon area, approximately 60 times the size 
of a syndrome computing cell. Since the decoding 
algorithm in Ref. 2 required ( N - /) of such cells, the 
entire systolic array needed much more silicon area 
than desired. 

(3) Erasure corrections became necessary and were not 
included in the original design. Hence the decoder 
required several modifications of the original architec- 
ture design in Ref. 2. 

To reduce the large circuit area required by the inverse 
transform operation it was decided to modify the original 
transform decoding algorithm. Also after considering the need 
for erasure correction, it was found that the decoding algo- 
rithm given in Ref. 4 could accommodate both requirements. 



In this algorithm the errata magnitudes are calculated in 
the time domain and a Chien search is used to find the error 
locations. The architecture of the new algorithm is designed 
to operate, sequentially in a pipeline, thereby enabling the 
circuit, size to grow with the error correcting capability (N - 1) 
instead of the code length N. 

The systolic array designed originally for the modified 
form of Euclid’s algorithm could process polynomials con- 
tinuously (Ref. 2). However, in real-time RS decoding, there is 
a need to compute only one syndrome polynomial for each 
received codeword. If one takes advantage of this by a better 
utilization of multiplexing, the required pipeline throughput 
rate can be maintained by the use of fewer basic cells. 

In this article, an improved VLSI architecture over that in 
Ref. 2 is developed utilizing the above observations. A systolic 
array is also designed for the needed polynomial expansion 
used in the erasure polynomial computation. These new modi- 
fications result in both an enhanced capability and a signifi- 
cant reduction in silicon area without any loss in the pipeline 
throughput rate. 


II. The Decoder Architecture 

Let N = 2 m - 1 the length of the (N, /) RS code over 
GF{ 2 m ) with design distance d. Suppose that t errors and 
s erasures occur, and s + 2r < d - 1 . The decoding procedure in 
Ref. 4 is summarized as follows 

Let X be an error location or an erasure location and 
A = {X l \X t is an erasure location}, X = {X i \X ) is an error 
location}. Let Y t be the corresponding errata magnitude and 
r = (/- 0 , r l , . . . , r N _ x ) be the received vector. 

Step 1 Compute the syndrome polynomial 
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Step 2. Compute the erasure locator polynomial 

A(Z)= H(Z-X t ) (2) 

X.eA 

from A. 

Step 3. Multiply S(Z) and A (Z) to obtain the Forney syn- 
drome polynomial 

T(Z) = S (Z) A (Z) (3) 

Step 4. Compute the errata evaluator polynomial A(Z) 
and the error locator polynomial X(Z) from T(Z) = [A(Z)]I 
X(Z)] by the modified Euclid’s algorithm. 

Step 5. Multiply A(Z) and X(Z) to get the errata locator 
polynomial 

I 

P(Z) = A(Z) X (Z) (4) 

Step 6. Perform Chien search on X(Z) to find the error 
location set X. 

I 

Step 7. Compute the errata magnitudes 

A(X k ) 

Yk = 

for 1 < k < s + t by evaluating A(Z) and P\Z). Use sets X 
and A to direct the additions of Y k to the received vector r. 

The pipeline architecture of the RS decoder is shown in 
Fig. 1. The decoder computes the syndrome polynomial 
S(Z) by the transform circuit given in Ref. 2. Tl^e erasure 
information A enters the decoder in the form of a binary 
sequence. 

The systolic array described in the next section expands 
the factors of 

A (Z) = n (Z-X) 

XeA 

i 

into the polynomial Polynomial multiplications are performed 
with a circuit described m Ref. 5 A new architecture is 
developed which implements the modified Euclid’s algorithm 
by operating on the produc f of S(Z) and A(Z) The resulting 
error locator polynomial X(Z) is then multiplied by A(Z), 
thereby obtaining the errata locator polynomial P(Z), 
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The derivative P'(Z ) of P(Z) is obtained by dropping the 
even terms of P(Z). The errata magnitudes Y k are calculated 
then by a field inversion and a number of multiplications 
Next the error locations are obtained in the form of a binary 
sequence by the use of another polynomial evaluation circuit 
which performs the Chien search on X(Z). This sequence of 
error locations, together with the input erasure location 
binary sequence, directs the addition of Y k to the received 
message. 


III. A VLSI Design for Expanding the Erasure 
Locator Polynomial 

It is reasonable to assume that the erasure location informa- 
tion derived from outside the chip, possibly from a convolu- 
tional decoder. Let it arrive serially in the form of 1 ’s and 0’s. 
A simple circuit of the form shown in Fig. 2(a) first converts 
this erasure data into a sequence of a k ’s and 0’s, where a k € A. 

Given a k E A, the computation of the erasure polynomial 
demands the expansion of 

A (Z)= JI (Z - a k ) 

a k eA 

= (Z - a 1 ) (Z - a* 2 ) . . . (Z - a s ) (6) 

Note that for an arbitrary polynomial Q(Z) that 

Q(Z) (Z - a k ) = ZQ(Z) - a k Q(Z) (7) 

Such an operation involves polynomial shifts, scalar multi- 
plications and additions Thus the multiplications of (Z - a k ) 
in Eq. (6) can be implemented by the systolic array given in 
Fig. 2(b). Since it contains zeros as well as a k ’s, the input 
stream is used to control the updating of the latches in each 
basic cell. At the end of the arrivals of the erasure locations, 
the coefficients of A(Z) are loaded from the latches into 
registers and shifted out serially. 


IV. A New Architecture to Perform the 
Modified Euclidean Algorithm 

A systolic array was designed in Ref. 2 to compute the 
error locator polynomial by a modified Euclidean algorithm. 
The array required 2f cells, twice the number of correctable 
errors. It is capable of performing the modified Euclidean 
algorithm continuously 


In the modified Euclidean algorithm only one syndrome 
polynomial is computed in the time interval of one code word. 
As a consequence, for the original architecture in Ref 2, 
a pipeline RS decoder is not as efficient as it might be. A 
substantial portion of the systolic array is always idling This 
fact makes possible a more efficient design with fewer cells 
and no loss in the throughput rate. 

For the (/V, /) RS code, the length of the syndrome poly- 
nomial N - I. The maximum length of the resultant Forney 
syndrome polynomial is also N - I. Imagine now that a single 
cell is used recursively to perform the successive steps of the 
modified Euclidean algorithm instead of pipelining data to the 
next cell. Then it would take N - 1 recursions to complete the 
algorithm, where each recursion requires N - I symbol times. 
Therefore, using a single cell recursively requires only a total 
of ( N - f) 2 symbol time to complete the modified form of 
Euclidean algorithm. Since a syndrome polynomial needs to 
arrive every N symbol times, only |)7V - I ) 2 / AJ cells are 
needed to process successive syndrome polynomials at a full 
pipeline throughput rate. 

Figure 3 shows the new alternate architecture design. The 
input multiplexer directs the syndrome polynomials to differ- 
ent cells. Each processor cell is almost identical to the cell 
presented in Ref. 2, except that it is used to process data 
recursively. 

The primary difference in the new cell structure from the 
architecture of the previous cell (Ref. 2) is presented as 
follows: Since division is avoided in the modified form of 
Euclid’s algorithm, a scalar factor appears at the output 
Although such a scale factor, call it K, is irrelevant to the 
problem of finding roots of the error locator polynomial 
X(Z), it must be removed from the errata evaluator polynomial 
A(Z). In order to effectively utilize the processor cell given 
in Ref. 2, the factor K which appears at the output of each 
cell is calculated independently of the cell computation This 
is accomplished by using a multiplier, operating recursively, 
to accumulate the product of all the nonzero leading coeffi- 
cients of the divisor polynomials. An inverse computation 
circuit and a multiplier after the demultiplexer is used to 
remove the unwanted scalar K from KA (Z). This computa- 
tional process is illustrated in Fig. 3 

The architecture of the new basic cell is given in Fig 4. 
Compared with the previous systolic array design (Ref 2), 
the present scheme for multiplexing the recursive cell compu- 
tations significantly reduces the number of cells and as a 
consequence the number of circuits Table 1 shows that the 
cell reduction is greater for high rate codes. 
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V. A Polynomial Evaluation Pipeline 

Polynomials are evaluated not only in the Chien search 
process, but also when the errata magnitudes are computed. 
In RS decoding, one needs to evaluate 

S+r-l 

MZ) = AZ' (8) 

1=0 

for Z = ct k and 0 < <j¥ - 1 given A ( ,0</<s + r- 1 Note 
that Eq (8) has a form which is identical to the syndrome 
computation Eq (1). 

However, in Eq. (8) the polynomial is shorter than in 
Eq. (1). Also since N > s + t - 1 , Eq. (8) is evaluated over a 
wider range than Eq. (1) is computed. These two differences 
make it inefficient to implement Eq. (8) in a manner similar to 
that used for syndrome computations. A better method is to 
evaluate A^u 1 )* sequentially for each k at cell /'. This is illus- 
trated in Fig. 5. The polynomial coefficient A t is multiplied by 
a‘ at the initialization of cell i. From then on a feedback loop 
computes the quantities A l (a') k for k = 1 , 2, 3, . . . , N - 1 . 
The summation shown at the bottom of the figure is imple- 
mented quite simply since all quantities are binary. 


VI. Conclusion 

An improved VLSI architecture of a pipeline Reed-Solomon 
decoder is presented herein Compared with the previous 
design in Ref 2, this architecture not only now corrects 
erasures, it is simpler, more regular, smaller in chip area and 
operates equally as fast. It is estimated that the polynomial 
expansion circuit and the polynomial multiplication circuit 
need approximately the same number of transistors as the 
syndrome computing pipeline. On the other hand, each 
polynomial evaluation circuit takes about half the number of 
transistors. Finally, each cell in the modified form of Euclid’s 
algorithm circuit requires approximately the same chip area 
as the syndrome circuit. 

Based on a previous nMOS chip fabrication of the syn- 
drome pipeline (Ref. 3) and the design of the basic cell of the 
modified form of Euclid’s algorithm (Ref. 6), it is estimated 
that a (15,9) RS decoder chip would require about 29 thou- 
sand transistors. A (31,15) RS decoder would require about 
88 thousand transistors. Considering the presently existing 
VLSI technology, a high throughput 5-bit (31,15) RS decoder 
could be implemented readily on a single VLSI chip. Of course 
such a chip would have a possible immediate application to 
JTIDS (for Joint Tactical Information Distribution System 
of DoD). 
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Table 1. The comparison of the number of cells required in the 
modified Euclid’s algorithm computation 


RS Code 

Full Systolic Array 

Multiplexing on 
Recursive Calls 

(15,9) 

6 

3 

(31,15) 

16 

9 

(255,223) 

32 
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Fig. 1. VLSI architecture of a pipeline Reed-Solomon decoder for both errors and erasures correction 




























































